24 research outputs found

    Population Structure and Genetic Diversity in a Rice Core Collection (Oryza sativa L.) Investigated with SSR Markers

    Get PDF
    The assessment of genetic diversity and population structure of a core collection would benefit to make use of these germplasm as well as applying them in association mapping. The objective of this study were to (1) examine the population structure of a rice core collection; (2) investigate the genetic diversity within and among subgroups of the rice core collection; (3) identify the extent of linkage disequilibrium (LD) of the rice core collection. A rice core collection consisting of 150 varieties which was established from 2260 varieties of Ting's collection of rice germplasm were genotyped with 274 SSR markers and used in this study. Two distinct subgroups (i.e. SG 1 and SG 2) were detected within the entire population by different statistical methods, which is in accordance with the differentiation of indica and japonica rice. MCLUST analysis might be an alternative method to STRUCTURE for population structure analysis. A percentage of 26% of the total markers could detect the population structure as the whole SSR marker set did with similar precision. Gene diversity and MRD between the two subspecies varied considerably across the genome, which might be used to identify candidate genes for the traits under domestication and artificial selection of indica and japonica rice. The percentage of SSR loci pairs in significant (P<0.05) LD is 46.8% in the entire population and the ratio of linked to unlinked loci pairs in LD is 1.06. Across the entire population as well as the subgroups and sub-subgroups, LD decays with genetic distance, indicating that linkage is one main cause of LD. The results of this study would provide valuable information for association mapping using the rice core collection in future

    iCanPlot: Visual Exploration of High-Throughput Omics Data Using Interactive Canvas Plotting

    Get PDF
    Increasing use of high throughput genomic scale assays requires effective visualization and analysis techniques to facilitate data interpretation. Moreover, existing tools often require programming skills, which discourages bench scientists from examining their own data. We have created iCanPlot, a compelling platform for visual data exploration based on the latest technologies. Using the recently adopted HTML5 Canvas element, we have developed a highly interactive tool to visualize tabular data and identify interesting patterns in an intuitive fashion without the need of any specialized computing skills. A module for geneset overlap analysis has been implemented on the Google App Engine platform: when the user selects a region of interest in the plot, the genes in the region are analyzed on the fly. The visualization and analysis are amalgamated for a seamless experience. Further, users can easily upload their data for analysisβ€”which also makes it simple to share the analysis with collaborators. We illustrate the power of iCanPlot by showing an example of how it can be used to interpret histone modifications in the context of gene expression

    Intuitive Visualization and Analysis of Multi-Omics Data and Application to Escherichia coli Carbon Metabolism

    Get PDF
    Combinations of β€˜omics’ investigations (i.e, transcriptomic, proteomic, metabolomic and/or fluxomic) are increasingly applied to get comprehensive understanding of biological systems. Because the latter are organized as complex networks of molecular and functional interactions, the intuitive interpretation of multi-omics datasets is difficult. Here we describe a simple strategy to visualize and analyze multi-omics data. Graphical representations of complex biological networks can be generated using Cytoscape where all molecular and functional components could be explicitly represented using a set of dedicated symbols. This representation can be used i) to compile all biologically-relevant information regarding the network through web link association, and ii) to map the network components with multi-omics data. A Cytoscape plugin was developed to increase the possibilities of both multi-omic data representation and interpretation. This plugin allowed different adjustable colour scales to be applied to the various omics data and performed the automatic extraction and visualization of the most significant changes in the datasets. For illustration purpose, the approach was applied to the central carbon metabolism of Escherichia coli. The obtained network contained 774 components and 1232 interactions, highlighting the complexity of bacterial multi-level regulations. The structured representation of this network represents a valuable resource for systemic studies of E. coli, as illustrated from the application to multi-omics data. Some current issues in network representation are discussed on the basis of this work

    Identifying Cis-Regulatory Sequences by Word Profile Similarity

    Get PDF
    Recognizing regulatory sequences in genomes is a continuing challenge, despite a wealth of available genomic data and a growing number of experimentally validated examples.We discuss here a simple approach to search for regulatory sequences based on the compositional similarity of genomic regions and known cis-regulatory sequences. This method, which is not limited to searching for predefined motifs, recovers sequences known to be under similar regulatory control. The words shared by the recovered sequences often correspond to known binding sites. Furthermore, we show that although local word profile clustering is predictive for the regulatory sequences involved in blastoderm segmentation, local dissimilarity is a more universal feature of known regulatory sequences in Drosophila.Our method leverages sequence motifs within a known regulatory sequence to identify co-regulated sequences without explicitly defining binding sites. We also show that regulatory sequences can be distinguished from surrounding sequences by local sequence dissimilarity, a novel feature in identifying regulatory sequences across a genome. Source code for WPH-finder is available for download at http://rana.lbl.gov/downloads/wph.tar.gz

    Large-Scale Identification of Mirtrons in Arabidopsis and Rice

    Get PDF
    A new catalog of microRNA (miRNA) species called mirtrons has been discovered in animals recently, which originate from spliced introns of the gene transcripts. However, only one putative mirtron, osa-MIR1429, has been identified in rice (Oryza sativa). We employed a high-throughput sequencing (HTS) data- and structure-based approach to do a genome-wide search for the mirtron candidate in both Arabidopsis (Arabidopsis thaliana) and rice. Five and eighteen candidates were discovered in the two plants respectively. To investigate their biological roles, the targets of these mirtrons were predicted and validated based on degradome sequencing data. The result indicates that the mirtrons could guide target cleavages to exert their regulatory roles post-transcriptionally, which needs further experimental validation

    Human Gene Coexpression Landscape: Confident Network Derived from Tissue Transcriptomic Profiles

    Get PDF
    This is an open-access article distributed under the terms of the Creative Commons Attribution License.[Background]: Analysis of gene expression data using genome-wide microarrays is a technique often used in genomic studies to find coexpression patterns and locate groups of co-transcribed genes. However, most studies done at global >omic> scale are not focused on human samples and when they correspond to human very often include heterogeneous datasets, mixing normal with disease-altered samples. Moreover, the technical noise present in genome-wide expression microarrays is another well reported problem that many times is not addressed with robust statistical methods, and the estimation of errors in the data is not provided. [Methodology/Principal Findings]: Human genome-wide expression data from a controlled set of normal-healthy tissues is used to build a confident human gene coexpression network avoiding both pathological and technical noise. To achieve this we describe a new method that combines several statistical and computational strategies: robust normalization and expression signal calculation; correlation coefficients obtained by parametric and non-parametric methods; random cross-validations; and estimation of the statistical accuracy and coverage of the data. All these methods provide a series of coexpression datasets where the level of error is measured and can be tuned. To define the errors, the rates of true positives are calculated by assignment to biological pathways. The results provide a confident human gene coexpression network that includes 3327 gene-nodes and 15841 coexpression-links and a comparative analysis shows good improvement over previously published datasets. Further functional analysis of a subset core network, validated by two independent methods, shows coherent biological modules that share common transcription factors. The network reveals a map of coexpression clusters organized in well defined functional constellations. Two major regions in this network correspond to genes involved in nuclear and mitochondrial metabolism and investigations on their functional assignment indicate that more than 60% are house-keeping and essential genes. The network displays new non-described gene associations and it allows the placement in a functional context of some unknown non-assigned genes based on their interactions with known gene families. [Conclusions/Significance]: The identification of stable and reliable human gene to gene coexpression networks is essential to unravel the interactions and functional correlations between human genes at an omic scale. This work contributes to this aim, and we are making available for the scientific community the validated human gene coexpression networks obtained, to allow further analyses on the network or on some specific gene associations. The data are available free online at http://bioinfow.dep.usal.es/coexpression/. Β© 2008 Prieto et al.Funding and grant support was provided by the Ministery of Health, Spanish Government (ISCiii-FIS, MSyC; Project reference PI061153) and by the Ministery of Education, Castilla-Leon Local Government (JCyL; Project reference CSI03A06).Peer Reviewe

    The next generation of training for arabidopsis researchers: Bioinformatics and Quantitative Biology

    Get PDF
    It has been more than 50 years since Arabidopsis (Arabidopsis thaliana) was first introduced as a model organism to understand basic processes in plant biology. A well-organized scientific community has used this small reference plant species to make numerous fundamental plant biology discoveries (Provart et al., 2016). Due to an extremely well-annotated genome and advances in high-throughput sequencing, our understanding of this organism and other plant species has become even more intricate and complex. Computational resources, including CyVerse,3 Araport,4 The Arabidopsis Information Resource (TAIR),5 and BAR,6 have further facilitated novel findings with just the click of a mouse. As we move toward understanding biological systems, Arabidopsis researchers will need to use more quantitative and computational approaches to extract novel biological findings from these data. Here, we discuss guidelines, skill sets, and core competencies that should be considered when developing curricula or training undergraduate or graduate students, postdoctoral researchers, and faculty. A selected case study provides more specificity as to the concrete issues plant biologists face and how best to address such challenges

    Exploring the Switchgrass Transcriptome Using Second-Generation Sequencing Technology

    Get PDF
    Background: Switchgrass (Panicum virgatum L.) is a C4 perennial grass and widely popular as an important bioenergy crop. To accelerate the pace of developing high yielding switchgrass cultivars adapted to diverse environmental niches, the generation of genomic resources for this plant is necessary. The large genome size and polyploid nature of switchgrass makes whole genome sequencing a daunting task even with current technologies. Exploring the transcriptional landscape using next generation sequencing technologies provides a viable alternative to whole genome sequencing in switchgrass. Principal Findings: Switchgrass cDNA libraries from germinating seedlings, emerging tillers, flowers, and dormant seeds were sequenced using Roche 454 GS-FLX Titanium technology, generating 980,000 reads with an average read length of 367 bp. De novo assembly generated 243,600 contigs with an average length of 535 bp. Using the foxtail millet genome as a reference greatly improved the assembly and annotation of switchgrass ESTs. Comparative analysis of the 454-derived switchgrass EST reads with other sequenced monocots including Brachypodium, sorghum, rice and maize indicated a 70– 80 % overlap. RPKM analysis demonstrated unique transcriptional signatures of the four tissues analyzed in this study. More than 24,000 ESTs were identified in the dormant seed library. In silico analysis indicated that there are more than 2000 EST-SSRs in this collection. Expression of several orphan ESTs was confirmed by RT-PCR. Significance: We estimate that about 90 % of the switchgrass gene space has been covered in this analysis. This study nearl

    Systems Biology of the Clock in Neurospora crassa

    Get PDF
    A model-driven discovery process, Computing Life, is used to identify an ensemble of genetic networks that describe the biological clock. A clock mechanism involving the genes white-collar-1 and white-collar-2 (wc-1 and wc-2) that encode a transcriptional activator (as well as a blue-light receptor) and an oscillator frequency (frq) that encodes a cyclin that deactivates the activator is used to guide this discovery process through three cycles of microarray experiments. Central to this discovery process is a new methodology for the rational design of a Maximally Informative Next Experiment (MINE), based on the genetic network ensemble. In each experimentation cycle, the MINE approach is used to select the most informative new experiment in order to mine for clock-controlled genes, the outputs of the clock. As much as 25% of the N. crassa transcriptome appears to be under clock-control. Clock outputs include genes with products in DNA metabolism, ribosome biogenesis in RNA metabolism, cell cycle, protein metabolism, transport, carbon metabolism, isoprenoid (including carotenoid) biosynthesis, development, and varied signaling processes. Genes under the transcription factor complex WCC (β€Š=β€ŠWC-1/WC-2) control were resolved into four classes, circadian only (612 genes), light-responsive only (396), both circadian and light-responsive (328), and neither circadian nor light-responsive (987). In each of three cycles of microarray experiments data support that wc-1 and wc-2 are auto-regulated by WCC. Among 11,000 N. crassa genes a total of 295 genes, including a large fraction of phosphatases/kinases, appear to be under the immediate control of the FRQ oscillator as validated by 4 independent microarray experiments. Ribosomal RNA processing and assembly rather than its transcription appears to be under clock control, suggesting a new mechanism for the post-transcriptional control of clock-controlled genes

    A Genome-Wide Gene Function Prediction Resource for Drosophila melanogaster

    Get PDF
    Predicting gene functions by integrating large-scale biological data remains a challenge for systems biology. Here we present a resource for Drosophila melanogaster gene function predictions. We trained function-specific classifiers to optimize the influence of different biological datasets for each functional category. Our model predicted GO terms and KEGG pathway memberships for Drosophila melanogaster genes with high accuracy, as affirmed by cross-validation, supporting literature evidence, and large-scale RNAi screens. The resulting resource of prioritized associations between Drosophila genes and their potential functions offers a guide for experimental investigations
    corecore